Statistical Programming in R

We use the following packages

library(MASS)     # Datasets
library(mice)     # Boys dataset
library(dplyr)    # Data manipulation
library(magrittr) # Pipes
library(ggplot2)  # Plotting suite
library(sf)       # Spatial features

Visualising in R

  • R makes it very easy to visualise data
  • But fine-tuning figures to specific standards can take a lot of time

Why visualise?

  • We can process a lot of information quickly with our eyes
  • More intuitively accessible to laymen
  • Plots give us information about
    • Distribution / shape
    • Irregularities
    • Assumptions
    • Intuitions

What we will do

  • A few plots in base graphics in R
  • Plotting with ggplot2 graphics
  • Plotting data on maps

base graphics in R

First, recall:

  • Vectors
height <- c(50.1, 53.5, 50.0, 54.5, 57.5)
weight <- c(3.65, 3.37, 3.14, 4.27, 5.03)
  • Data frames
boys <- boys
head(boys)
##      age  hgt   wgt   bmi   hc  gen  phb tv   reg
## 3  0.035 50.1 3.650 14.54 33.7 <NA> <NA> NA south
## 4  0.038 53.5 3.370 11.77 35.0 <NA> <NA> NA south
## 18 0.057 50.0 3.140 12.56 35.2 <NA> <NA> NA south
## 23 0.060 54.5 4.270 14.37 36.7 <NA> <NA> NA south
## 28 0.062 57.5 5.030 15.21 37.3 <NA> <NA> NA south
## 36 0.068 55.5 4.655 15.11 37.0 <NA> <NA> NA south

To call a variable in the data frame, use the $ notation:

boys$hgt[1:10]
##  [1] 50.1 53.5 50.0 54.5 57.5 55.5 52.5 53.0 55.1 54.5

Scatter plot

plot(x = boys$hgt, y = boys$wgt, main = "Scatter plot", 
     xlab = "Height", ylab = "Weight", bty = "L")

Line chart

plot(x = 1:5, y = exp(1:5), type = "l", main = "Line chart", bty = "L")

Bar chart

counts <- table(boys$reg)

barplot(counts, main="Bar chart", ylab = "N")

Pie chart

counts <- table(boys$reg)

pie(x=counts, main="Pie chart")

Histogram

hist(boys$hgt, main = "Histogram", xlab = "Height")

Box plot

boxplot(boys$hgt ~ boys$reg, main = "Boxplot", 
        xlab = "Region", ylab = "Height")

But what if we want more control?

ggplot2

What is ggplot2?

Layered plotting based on the book The Grammar of Graphics by Leland Wilkinsons.

With ggplot2 you

  1. provide the data
  2. define how to map variables to aesthetics
  3. state which geometric object to display
  4. (optional) edit the overall theme of the plot

ggplot2 then takes care of the details

An example: scatterplot

1: Provide the data

boys %>%
  ggplot()

2: map variable to aesthetics

boys %>%
  ggplot(aes(x = age, y = bmi))

3: state which geometric object to display

boys %>%
  ggplot(aes(x = age, y = bmi)) +
  geom_point()

An example: scatterplot

Why this syntax?

Create the plot

gg <- 
  boys %>%
  ggplot(aes(x = age, y = bmi)) +
  geom_point()

Add another layer (smooth fit line)

gg <- gg + 
  geom_smooth(col = "dark blue")

Give it some labels and a nice look

gg <- gg + 
  labs(x = "Age", y = "BMI", title = "BMI trend for boys") +
  theme_minimal()

Why this syntax?

plot(gg)

Why this syntax?

Aesthetics

  • x
  • y
  • size
  • colour
  • fill
  • opacity (alpha)
  • linetype
  • …

Aesthetics

gg <- 
  boys %>% 
  filter(!is.na(reg)) %>% 
  
  ggplot(aes(x      = age, 
             y      = bmi, 
             size   = hc, 
             colour = reg)) +
  
  geom_point(alpha = 0.5) +
  
  labs(title  = "BMI trend for boys",
       x      = "Age", 
       y      = "BMI", 
       size   = "Head circumference",
       colour = "Region") +
  theme_minimal()

Aesthetics

plot(gg)

Geoms

  • geom_point
  • geom_bar
  • geom_line
  • geom_smooth

  • geom_histogram
  • geom_boxplot
  • geom_density

Geoms: Bar

Geoms: Line

Geoms: Smooth

Geoms: Boxplot

Helpful link in RStudio

Maps

Simple Features

  • A formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects.
  • As implemented e.g. in ArcGIS
  • Implemented for R in the sf package
  • Feature geometries are stored in data.frames

We have time for a cursory introduction at most.

Reading in spatial data

denmark <- st_read("DK_map.shp")
plot(st_geometry(denmark))

Plotting regional attributes

denmark$proportion.over.70 <- denmark$over70/denmark$population

plot(denmark["proportion.over.70"],
     main = "Proportion of population aged 70 years and above")

Or we can ggplot

denmark %>% ggplot(aes(fill=proportion.over.70)) + geom_sf()

Practical